在本文中,我们重新审视了私人经验风险最小化(DP-erm)和差异私有随机凸优化(DP-SCO)的问题。我们表明,来自统计物理学(Langevin Exfusion(LD))的经过良好研究的连续时间算法同时为DP-SCO和DP-SCO提供了最佳的隐私/实用性权衡,$ \ epsilon $ -DP和$ $ \ epsilon $ -DP和$ (\ epsilon,\ delta)$ - dp均用于凸和强烈凸损失函数。我们为LD提供新的时间和尺寸独立统一稳定性,并使用我们为$ \ epsilon $ -DP提供相应的最佳超额人口风险保证。 $ \ epsilon $ -DP的DP-SCO保证的一个重要属性是,它们将非私人最佳界限匹配为$ \ epsilon \与\ infty $。在此过程中,我们提供了各种技术工具,这些工具可能引起独立的关注:i)在两个相邻数据集上运行损失功能时,一个新的r \'enyi Divergence绑定了LD,ii)最后一个过多的经验风险范围迭代LD,类似于Shamir和Zhang的嘈杂随机梯度下降(SGD)和iii)的LD,对LD进行了两期多余的风险分析,其中第一阶段是当扩散在任何合理意义上都没有在任何合理意义上融合到固定分布时,在第二阶段扩散已收敛到吉布斯分布的变体。我们的普遍性结果至关重要地依赖于LD的动力学。当它融合到固定分布时,我们获得了$ \ epsilon $ -DP的最佳界限。当它仅在很短的时间内运行$ \ propto 1/p $时,我们在$(\ epsilon,\ delta)$ -DP下获得最佳界限。在这里,$ p $是模型空间的维度。
translated by 谷歌翻译
深度学习(DL)模型为各种医学成像基准挑战提供了最先进的性能,包括脑肿瘤细分(BRATS)挑战。然而,局灶性病理多隔室分割(例如,肿瘤和病变子区)的任务特别具有挑战性,并且潜在的错误阻碍DL模型转化为临床工作流程。量化不确定形式的DL模型预测的可靠性,可以实现最不确定的地区的临床审查,从而建立信任并铺平临床翻译。最近,已经引入了许多不确定性估计方法,用于DL医学图像分割任务。开发指标评估和比较不确定性措施的表现将有助于最终用户制定更明智的决策。在本研究中,我们探索并评估在Brats 2019-2020任务期间开发的公制,以对不确定量化量化(Qu-Brats),并旨在评估和排列脑肿瘤多隔室分割的不确定性估计。该公制(1)奖励不确定性估计,对正确断言产生高置信度,以及在不正确的断言处分配低置信水平的估计数,(2)惩罚导致更高百分比的无关正确断言百分比的不确定性措施。我们进一步基准测试由14个独立参与的Qu-Brats 2020的分割不确定性,所有这些都参与了主要的Brats细分任务。总体而言,我们的研究结果证实了不确定性估计提供了分割算法的重要性和互补价值,因此突出了医学图像分析中不确定性量化的需求。我们的评估代码在HTTPS://github.com/ragmeh11/qu-brats公开提供。
translated by 谷歌翻译
我们重新审视使​​用公共数据来改善差异私有(DP)模型培训的隐私/实用权折衷的问题。在这里,公共数据是指没有隐私问题的辅助数据集。我们考虑与私人培训数据相同的分发的公共数据。对于凸损失,我们表明镜子血清的变体提供了与模型的维度($ p $)的人口风险保证。具体地,我们将镜像血液应用于由公共数据生成的丢失作为镜像映射,并使用私有(敏感)数据生成的丢失的DP梯度。为了获得维度独立性,我们需要$ g_q ^ 2 \ leq p $公共数据样本,其中$ g_q $是损失功能各向同性的量度。我们进一步表明,我们的算法具有天然的“噪音稳定性”属性:如果围绕当前迭代公共损失,请以$ V $的方向满足$ \ alpha_v $ -strong凸性,然后使用嘈杂的渐变而不是确切的渐变偏移我们的下一次迭代$ v $ v $比例为$ 1 / alpha_v $(与DP-SGD相比,换档是各向同性的)。在前作品中的类似结果必须使用预处理器矩阵形式的公共数据明确地学习几何图形。我们的方法也适用于非凸损失,因为它不依赖于凸起假设以确保DP保证。我们通过显示线性回归,深度学习基准数据集(Wikitext-2,Cifar-10和Emnist)以及联合学习(StackOverflow)来证明我们的算法的经验效果。我们表明,我们的算法不仅显着改善了传统的DP-SGD和DP-FedAVG,它没有访问公共数据,而且还可以改善DP-SGD和DP-FedAVG对已与公众预先培训的模型数据开始。
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译
Periocular refers to the region of the face that surrounds the eye socket. This is a feature-rich area that can be used by itself to determine the identity of an individual. It is especially useful when the iris or the face cannot be reliably acquired. This can be the case of unconstrained or uncooperative scenarios, where the face may appear partially occluded, or the subject-to-camera distance may be high. However, it has received revived attention during the pandemic due to masked faces, leaving the ocular region as the only visible facial area, even in controlled scenarios. This paper discusses the state-of-the-art of periocular biometrics, giving an overall framework of its most significant research aspects.
translated by 谷歌翻译
Recently, Smart Video Surveillance (SVS) systems have been receiving more attention among scholars and developers as a substitute for the current passive surveillance systems. These systems are used to make the policing and monitoring systems more efficient and improve public safety. However, the nature of these systems in monitoring the public's daily activities brings different ethical challenges. There are different approaches for addressing privacy issues in implementing the SVS. In this paper, we are focusing on the role of design considering ethical and privacy challenges in SVS. Reviewing four policy protection regulations that generate an overview of best practices for privacy protection, we argue that ethical and privacy concerns could be addressed through four lenses: algorithm, system, model, and data. As an case study, we describe our proposed system and illustrate how our system can create a baseline for designing a privacy perseverance system to deliver safety to society. We used several Artificial Intelligence algorithms, such as object detection, single and multi camera re-identification, action recognition, and anomaly detection, to provide a basic functional system. We also use cloud-native services to implement a smartphone application in order to deliver the outputs to the end users.
translated by 谷歌翻译
As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text.
translated by 谷歌翻译
This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem, where one can not infer the loss incurred for selecting an action from observed feedback. We also introduce a new setup named Censored Semi Bandits, where the loss incurred for selecting an action can be observed under certain conditions. Finally, we study the channel selection problem in the communication networks, where the reward for an action is only observed when no other player selects that action to play in the round. These problems find applications in many fields like healthcare, crowd-sourcing, security, adaptive resource allocation, among many others. This thesis aims to address the above-described sequential decision problems by exploiting specific structures these problems exhibit. We develop provably optimal algorithms for each of these setups with weak feedback and validate their empirical performance on different problem instances derived from synthetic and real datasets.
translated by 谷歌翻译
This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem, where one can not infer the loss incurred for selecting an action from observed feedback. We also introduce a new setup named Censored Semi Bandits, where the loss incurred for selecting an action can be observed under certain conditions. Finally, we study the channel selection problem in the communication networks, where the reward for an action is only observed when no other player selects that action to play in the round. These problems find applications in many fields like healthcare, crowd-sourcing, security, adaptive resource allocation, among many others. This thesis aims to address the above-described sequential decision problems by exploiting specific structures these problems exhibit. We develop provably optimal algorithms for each of these setups with weak feedback and validate their empirical performance on different problem instances derived from synthetic and real datasets.
translated by 谷歌翻译
Lack of factual correctness is an issue that still plagues state-of-the-art summarization systems despite their impressive progress on generating seemingly fluent summaries. In this paper, we show that factual inconsistency can be caused by irrelevant parts of the input text, which act as confounders. To that end, we leverage information-theoretic measures of causal effects to quantify the amount of confounding and precisely quantify how they affect the summarization performance. Based on insights derived from our theoretical results, we design a simple multi-task model to control such confounding by leveraging human-annotated relevant sentences when available. Crucially, we give a principled characterization of data distributions where such confounding can be large thereby necessitating the use of human annotated relevant sentences to generate factual summaries. Our approach improves faithfulness scores by 20\% over strong baselines on AnswerSumm \citep{fabbri2021answersumm}, a conversation summarization dataset where lack of faithfulness is a significant issue due to the subjective nature of the task. Our best method achieves the highest faithfulness score while also achieving state-of-the-art results on standard metrics like ROUGE and METEOR. We corroborate these improvements through human evaluation.
translated by 谷歌翻译